image
With the increasing integration of technology within people’s lives, fewer people are turning to books as a source of entertainment. When the COVID pandemic hit, people finally turned back to books after a steady decline in the number of readers slowly lowering since the beginning of the twenty-first century. With everyone stuck at home with not much better to do, book sales saw a massive increase, but especially eBooks, considering they’re more portable and accessible. In the young adult setting, the creation of BookTok brought many readers back to the hobby as well and introducing new readers through this online community.
This report looks over Leigh Bardugo’s Grishaverse novels and their trends in popularity over the years, as not only is she one of my personal favorite authors, but she is one of the most prominent young adult fantasy authors right now. This report hopes to discover:
This dataset is filtered to only include the seven Grishaverse novels (Shadow and Bone, Siege and Storm, Ruin and Rising, Six of Crows, Crooked Kingdom, King of Scars, and Rule of Wolves).
=======An introduction of the data and a description of the trends/books/items you are choosing to analyze (and why!)
When I saw that we were analyzing library data in class, I got super excited. I’m an avid reader, but I’m also super intrigued by popular book trends, especially following the pandemic when BookTok took off and reshaped the reading landscape. For my project, I chose to look at Leigh Bardugo’s works in particular and their trends in popularity, as not only is she one of my favorite authors, but her books only seem to increase in popularity over the years.
>>>>>>> 68ba237644ac09c0b877cdd324a017a9fb0aa239Write a summary paragraph of findings that includes the 5 values calculated from your summary information R script
These will likely be calculated using your DPLYR skills, answering questions such as:
What is the average number of checkouts for each item?
What is the month or year with the most/least checkouts for a book that you’re interested in?
What is the month or year with the most/least checkouts for ebooks?
How has the number of print book checkouts changed over time?
What is the most popular Grishaverse novel?
Most popular Grishaverse series?
How many checkouts of Bardugo’s Grishaverse novels on average per year
What is the month from 2017-2022 with the most checkouts
Most popular material type for her books
Feel free to calculate and report values that you find relevant.
Seattle Public Library collects their monthly checkout data and publishes it for the public. Some of the parameters in this data set include, title, publication date, checkout month and year, genre, creator, material type, and number of checkouts among other examples. The data is generated from a variety of sources, including: Overdrive, hoopla, Freegal, and RBDigital which provide electronic data for digital items while physical item checkout data was sourced from the Lebgrady artwork data archives from April 2005 to September 2016. Currently, physical item checkout data is collected from the Horizon ILS. This data set essentially acts as a monthly record of library checkouts, holding hundreds of thousands of checkouts over the years. In a sense, it acts as a marker of history from 2005 until the present.
At first glance, this data set is quite messy; certain parameters are formatted strangely, such as Publication Year which has around seven different potential formats that each mean different things. Many of the same books are referenced under different titles depending on the different material type. In addition, other fields have such a wide variety of different answers that it may be difficult to find trends that don’t occur over such a vast margin, such as when sorting by subjects. There are gaps in the data here and there and plenty of NA values, and given the size of the data set, there is quite a lot to sort through.
There are certain variables that would’ve been useful to track alongside what has already been collected. For example, we may be able to draw conclusions about a book’s quality or success based on the number of checkouts, but there isn’t any other statistic provided that may strengthen those assumptions such as an average rating from Goodreads or something along those lines. We have no way of knowing the demographics of the people who are checking out items, as all that is tracked is information about the items themselves, so we cannot draw any conclusions about people or culture in relation to checkouts without making inferences. Ultimately, this data set is just about library checkout data, so its scope only exists within that range.
This line chart visualizes the total number of checkouts for each of the seven books across the Grishaverse series split into their respective series: the Dregs Series (Six of Crows and Crooked Kingdom), the Grisha Trilogy (Shadow and Bone, Siege and Storm, and Ruin and Rising), and the King of Scars Duology (King of Scars and Rule of Wolves).
The most popular series in terms of checkouts is the Grisha Trilogy, followed by the Dregs Series, and finally, the King of Scars Duology. The Grisha Trilogy is the only series that increases steadily over time, while the other two series have had periods of an increase in checkouts or a decrease in checkouts. While the Grisha Trilogy has only increased in checkout numbers over the years, this number spiked significantly in 2021, which can be possibly explained by the release of the Shadow and Bone Netflix adaptation in April of 2021.
This dataframe only tracks data from 2017 onwards, but it looks as though the Dregs Series had a decrease in checkouts from 2017 to 2019 after the second book, Crooked Kingdom, was released in 2016. However, it did see an increase in checkouts around 2019 again around the release of the first King of Scars novel and spiked in 2021, around the time of the COVID pandemic, only to decrease again in 2022.
The King of Scars Duology has the lowest number of checkouts all around, although, in 2022, it started catching up to the Dregs Series. Both of these series have a significantly lower number of checkouts in comparison to the Grisha Trilogy.
If you hover over the different bars, you can view the total number of checkouts for each novel each month.
This bar chart tracks total checkouts for each Grishaverse novel from January to December of 2021. Except for January and Feburary, where Six of Crows had the highest number of checkouts, Shadow and Bone is the most commonly checked out Grishaverse novel, while Rule of Wolves has the least number of checkouts by far. What’s most striking about this chart is the steady increase from February to April until there is a sudden spike in numbers in May, with a total of 788 checkouts between all novels compared to April’s 385 checkouts, which is 403 more checkouts.
The number of checkouts between May and December remain around the same area, with a slight decrease from May to September. October of 2021 saw the highest number of checkouts with 824 checkouts!
=======Who collected/published the data? Seattle Public Library collects monthly checkout data and publishes it for the public.
What are the parameters of the data (dates, number of checkouts, kinds of books, etc.)?
The parameters of this data include, title, publication date, checkout month and year, genre, creator, material type, and number of checkouts among other examples.
The data is generated from a variety of sources. Overdrive, hoopla, Freegal, and RBDigital provide electronic data for digital items while physical item checkout data was sourced from the Lebgrady artwork data archives from April 2005 to September 2016. Currently, physical item checkout data is collected from the Horizon ILS.
This dataset acts as a monthly record of library checkouts, holding hundreds of thousands of checkouts over the years. In a sense, this dataset acts as a marker of history from 2005 until the present.
What, if any, ethical questions do you need to consider when working with this data?
What are possible limitations or problems with this data? (at least 200 words)
This dataset is quite messy at first glance. Certain parameters are formatted strangely, such as Publication Year which has around seven different potential formats that each mean different things. In addition, other fields have such a wide variety of different answers that it may be difficult to find trends that don’t occur over such a vast margin, such as when sorting by subjects. There are gaps in the data here and there and plenty of NA values, and given the size of the dataset, there is quite a lot to sort through. There are certain variables that would’ve been useful to track alongside what has already been collected. For example, we may be able to draw conclusions about a book’s quality or success based on the number of checkouts, but there isn’t any other statistic provided that may strengthen those assumptions such as an average rating from Goodreads or something along those lines. We have no way of knowing the demographics of the people who are checking out items, as all that is tracked is information about the items themselves, so we cannot draw any conclusions about people or culture in relation to checkouts. Ultimately, this dataset is just about library checkout data, so its scope only exists within that range.
Include a chart. Make sure to describe why you included the chart, and what patterns emerged
The first chart that you will create and include will show the trend over time of your variable/topic/interest. Think carefully about what you want to communicate to your user (you may have to find relevant trends in the dataset first!). Here are some requirements to help guide your design:
When we say “clear” or “human readable” titles and labels, that means that you should not just display the variable name.
Here’s an example of how to run an R script inside an RMarkdown file:
Include a chart. Make sure to describe why you included the chart, and what patterns emerged
The second chart that you will create and include will show another trend over time of your variable/topic/interest. Think carefully about what you want to communicate to your user (you may have to find relevant trends in the dataset first!). Here are some requirements to help guide your design:
When we say “clear” or “human readable” titles and labels, that means that you should not just display the variable name.
Here’s an example of how to run an R script inside an RMarkdown file:
The last chart is up to you. It could be a line plot, scatter plot, histogram, bar plot, stacked bar plot, and more. Here are some requirements to help guide your design:
Here’s an example of how to run an R script inside an RMarkdown file:
<<<<<<< HEAD#{r, echo = FALSE, code = readLines("chart2_example.R")} #